# Competitive Eeperience Replay (CER)

## 1 Overview

Competitive Experience Replay (CER) is a strategy for goal-directed RL with sparse reward. In CER, a pair of agents, $$\pi _A$$ and $$\pi _B$$, are trained simultaneously.

Agent $$\pi _A$$ is punished if agent $$\pi _B$$ is also visited the near states, $$|s_A^i - s_B^j| < \delta$$. Whereas, agent $$\pi _B$$ gets reward if it visits near the agent $$\pi _A$$. Such reward re-labelling is executed per mini-batch.

Depending on initializing methods, there are two variants of CER. Independent-CER initializes $$\pi _B$$ with the task’s initial distribution. Interact-CER initializes $$\pi _B$$ with random off-policy sample of $$\pi _A$$.

The authors pointed out that CER complemented the HER. Together with HER, CER improves performance.

## 2 With cpprb

Because of flexible environment design, cpprb can store a pair of transitions from 2 agents. However, the current version of cpprb doesn’t have the functionality of sampling episodes instead of transitions.